Search Engine-Crawler Symbiosis: Adapting to Community Interests
نویسندگان
چکیده
Web crawlers have been used for nearly a decade as a search engine component to create and update large collections of documents. Typically the crawler and the rest of the search engine are not closely integrated. If the purpose of a search engine is to have as large a collection as possible to serve the general Web community, a close integration may not be necessary. However, if the search engine caters to a specific community with shared focused interests, it can take advantage of such an integration. In this paper we investigate a tightly coupled system in which the crawler and the search engine engage in a symbiotic relationship. The crawler feeds the search engine and the search engine in turn helps the crawler to better its performance. We show that the symbiosis can help the system learn about a community’s interests and serve such a community with better focus.
منابع مشابه
Search Engine-Crawler Symbiosis
Web crawlers have been used for nearly a decade as a search engine component to create and update large collections of documents. Typically the crawler and the rest of the search engine are not closely integrated. If the purpose of a search engine is to have as large a collection as possible to serve the general Web community, a close integration may not be necessary. However, if the search eng...
متن کاملDHT-Based Distributed Crawler
A search engine, like Google, is built using two pieces of infrastructure a crawler that indexes the web and a searcher that uses the index to answer user queries. While Google's crawler has worked well, there is the issue of timeliness and the lack of control given to end-users to direct the crawl according to their interests. The interface presented by such search engines is hence very limite...
متن کاملA Grid Focused Community Crawling Architecture for Medical Information Retrieval Services
This paper describes a GRID focused community crawling architecture and its possible adoption in a medical information domain. This architecture has been designed for handling a retrieval information service to individuals that are entitled to access the highly distributed computational power of the GRID, eliminating the need of a central authority/repository such as a unique search engine. In ...
متن کاملDesigning and Implementation of "regional Crawler" as a New Strategy for Crawling the Web
By the rapid growth of the World Wide Web, the significance and popularity of search engines are increasing day by day. However, today web crawlers are unable to update their search engine indexes concurrent to the growth in the information available on the web. This sometimes causes users to be unable to search on recent or updated information. Regional Crawler that we are proposing in this pa...
متن کاملWeb Crawler: Extracting the Web Data
Internet usage has increased a lot in recent times. Users can find their resources by using different hypertext links. This usage of Internet has led to the invention of web crawlers. Web crawlers are full text search engines which assist users in navigating the web. These web crawlers can also be used in further research activities. For e.g. the crawled data can be used to find missing links, ...
متن کامل